Goto

Collaborating Authors

 xgboost and lightgbm



Faster Boosting with Smaller Memory

Neural Information Processing Systems

State-of-the-art implementations of boosting, such as XGBoost and LightGBM, can process large training sets extremely fast. However, this performance requires that the memory size is sufficient to hold a 2-3 multiple of the training set size. This paper presents an alternative approach to implementing the boosted trees, which achieves a significant speedup over XGBoost and LightGBM, especially when the memory size is small. This is achieved using a combination of three techniques: early stopping, effective sample size, and stratified sampling. Our experiments demonstrate a 10-100 speedup over XGBoost when the training data is too large to fit in memory.



Faster Boosting with Smaller Memory

Neural Information Processing Systems

State-of-the-art implementations of boosting, such as XGBoost and LightGBM, can process large training sets extremely fast. However, this performance requires that the memory size is sufficient to hold a 2-3 multiple of the training set size. This paper presents an alternative approach to implementing the boosted trees, which achieves a significant speedup over XGBoost and LightGBM, especially when the memory size is small. This is achieved using a combination of three techniques: early stopping, effective sample size, and stratified sampling. Our experiments demonstrate a 10-100 speedup over XGBoost when the training data is too large to fit in memory.


Enhancing the Product Quality of the Injection Process Using eXplainable Artificial Intelligence

Hong, Jisoo, Hong, Yongmin, Baek, Jung-Woo, Kang, Sung-Woo

arXiv.org Artificial Intelligence

The injection molding process is a traditional technique for making products in various industries such as electronics and automobiles via solidifying liquid resin into certain molds. Although the process is not related to creating the main part of engines or semiconductors, this manufacturing methodology sets the final form of the products. Re-cently, research has continued to reduce the defect rate of the injection molding process. This study proposes an optimal injection molding process control system to reduce the defect rate of injection molding products with XAI (eXplainable Artificial Intelligence) ap-proaches. Boosting algorithms (XGBoost and LightGBM) are used as tree-based classifiers for predicting whether each product is normal or defective. The main features to control the process for improving the product are extracted by SHapley Additive exPlanations, while the individual conditional expectation analyzes the optimal control range of these extracted features. To validate the methodology presented in this work, the actual injection molding AI manufacturing dataset provided by KAMP (Korea AI Manufacturing Platform) is employed for the case study. The results reveal that the defect rate decreases from 1.00% (Original defect rate) to 0.21% with XGBoost and 0.13% with LightGBM, respectively.


Faster Boosting with Smaller Memory

Neural Information Processing Systems

State-of-the-art implementations of boosting, such as XGBoost and LightGBM, can process large training sets extremely fast. However, this performance requires that the memory size is sufficient to hold a 2-3 multiple of the training set size. This paper presents an alternative approach to implementing the boosted trees, which achieves a significant speedup over XGBoost and LightGBM, especially when the memory size is small. This is achieved using a combination of three techniques: early stopping, effective sample size, and stratified sampling. Our experiments demonstrate a 10-100 speedup over XGBoost when the training data is too large to fit in memory.


Biomarker based Cancer Classification using an Ensemble with Pre-trained Models

Lee, Chongmin, Kim, Jihie

arXiv.org Machine Learning

Certain cancer types, namely pancreatic cancer is difficult to detect at an early stage; sparking the importance of discovering the causal relationship between biomarkers and cancer to identify cancer efficiently. By allowing for the detection and monitoring of specific biomarkers through a non-invasive method, liquid biopsies enhance the precision and efficacy of medical interventions, advocating the move towards personalized healthcare. Several machine learning algorithms such as Random Forest, SVM are utilized for classification, yet causing inefficiency due to the need for conducting hyperparameter tuning. We leverage a meta-trained Hyperfast model for classifying cancer, accomplishing the highest AUC of 0.9929 and simultaneously achieving robustness especially on highly imbalanced datasets compared to other ML algorithms in several binary classification tasks (e.g. breast invasive carcinoma; BRCA vs. non-BRCA). We also propose a novel ensemble model combining pre-trained Hyperfast model, XGBoost, and LightGBM for multi-class classification tasks, achieving an incremental increase in accuracy (0.9464) while merely using 500 PCA features; distinguishable from previous studies where they used more than 2,000 features for similar results.


XGBoost vs LightGBM on a High Dimensional Dataset

#artificialintelligence

I have recently completed a multi-class classification problem given as a take-home assignment for a data scientist position. It was a good opportunity to compare the two state-of-the-art implementations of gradient boosting decision trees which are XGBoost and LightGBM. Both algorithms are so powerful that they are prominent among the best performing machine learning models. The dataset contains over 60 thousand observations and 103 numerical features. The target variable contains 9 different classes.


Faster Boosting with Smaller Memory

Alafate, Julaiti, Freund, Yoav S.

Neural Information Processing Systems

State-of-the-art implementations of boosting, such as XGBoost and LightGBM, can process large training sets extremely fast. However, this performance requires that the memory size is sufficient to hold a 2-3 multiple of the training set size. This paper presents an alternative approach to implementing the boosted trees, which achieves a significant speedup over XGBoost and LightGBM, especially when the memory size is small. This is achieved using a combination of three techniques: early stopping, effective sample size, and stratified sampling. Our experiments demonstrate a 10-100 speedup over XGBoost when the training data is too large to fit in memory. Papers published at the Neural Information Processing Systems Conference.


Malware Classification using Machine Learning

#artificialintelligence

If you love to explore large and challenging data sets, then probably you should give Microsoft Malware Classification a try. Before diving deep in to the problem let's take few points on what can you expect to learn from this: In the past few years, the malware industry has grown very rapidly that, the syndicates invest heavily in technologies to evade traditional protection, forcing the anti-malware groups/communities to build more robust software to detect and terminate these attacks. The major part of protecting a computer system from a malware attack is to identify whether a given piece of file/software is a malware. We can map the business problem to a multi-class classification problem, where we need to predict the class for each given byte files among nine categories (Ramnit, Lollipop, Kelihos_ver3, Vundo, Simda,Tracur, Kelihos_ver1, Obfuscator.ACY, Gatak). Constrains: We need to provide the class probability, wrongly classified class labels should be penalized(that's why log loss has been chosen as KPI) and there should some latency bound.